The complete code from this tutorial is available here.
The main goals of this tutorial are to:
Fit AutoARIMA model and show the model equation
Fit ARIMA model with seasonality and without
Fit NaïveForecaster
Compare the performance of all of them
Make beautiful visualizations
In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. Both of these models are fitted to time series data either to better understand the data or to predict future points in the series (forecasting).
First things first, we are are going to install and import modules needed for the tutorial.
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.model_selection import temporal_train_test_split, SlidingWindowSplitter, ForecastingGridSearchCV
from sktime.utils.plotting import plot_series
from sktime.forecasting.naive import NaiveForecaster
from sklearn.metrics import mean_absolute_percentage_error
from sktime.forecasting.arima import ARIMA, AutoARIMA
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import re
import warnings
warnings.filterwarnings("ignore")
# setting graphs size
plt.rcParams["figure.figsize"] = [16,7]
# for fancy plots
plt.style.use('ggplot')
Note, while writing this report, the version of sktime module was 0.5.3. You may check your version of sktime in the following way:
from sktime import __version__ as sktime_version
print("sktime version", sktime_version)
In the period of a pandemic, it is essential to stay healthy. During such times, any medicine is vital. For this reason, we found a thematic dataset. We have downloaded the dataset called "Monthly anti-diabetic drug sales in Australia from 1991 to 2008" or a10. It contains two columns –the data of the record in format YYYY-MM and value that is total monthly scripts for pharmaceutical products falling under ATC code A10, as recorded by the Australian Health Insurance Commission.
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col="date")
We need to specify the frequency before converting it to series. It is a common practice of setting the index to the date. It is beneficial because, firstly, it is always easier to access the value by date. Also, it is needed for converting to series and plotting.
df.index = pd.PeriodIndex(df.index, freq="M")
In this way we convert pandas.core.frame.DataFrame object to pandas.core.series.Series.
series = df.T.iloc[0]
series
As we can see from the table, each row in the dataframe represents a value for each month between july 1991 and june 2008.
We can plot the series, and the fastest method is using plot_series from sktime.utils.plotting. As an alternative we can use either .plot() method of series object or you can plot with matplotlib, but we will use the first approach now.
plot_series(series);
We can see that the value rises annualy. Also, we may observe a distinct seasonality with 12 month period. Amplitude increases with each year. Therefore, it will be reasonable to use the model with the seasonality* component. We will compare it with a similar model but without seasonality and see the prediction error
In time series data, seasonality is the presence of variations that occur at specific regular intervals less than a year, such as weekly, monthly, or quarterly. Seasonal fluctuations in a time series can be contrasted with cyclical patterns.
As we can see from the plot, our data is seasonal with one year period.
Let us run AutoARIMA function from sktime.forecasting.arima package. It will run all possible combinations of models given function parameters. We specify sp parameter to 12, because we have seasional fluctuations every 12 month. Also, we may hide the warnings messages. See
documentation. for more detailed description of the parameters.
model_auto = AutoARIMA(sp=12, suppress_warnings=True).fit(series)
The main result of the model is a summary that provides important information about the parameters of the best found ARIMA model given the hyperparameters. The summary includes the values of the coefficients of the fitted model (with corresponding standard errors, z-statistics and p-values), and some metrics such as AIC and BIC.
summary = model_auto.summary()
summary
According to p-values from this summary, all the coefficients are significant. AIC is 518.98.
If we use regular expressions, we can get the parameters of the model from the text of the summary. Or if can type it by hand.
def get_params(summary_text):
full = re.findall(r'SARIMAX\(.*?\)x\(.*?\)', summary_text)[0]
info = [int(_) for _ in re.findall(r'\d+', full)]
return info
Let us extract the best model's parameters from AutoARIMA summary object.
p, d, q, P, D, Q, S = get_params(summary.as_text())
According to the summary, the best model given the functions inputs is $ARIMA(3,1,2)(0,1,1)_{12}$
While writing the model equation, we check with
“ Forecasting: Principles and Practice” by
Rob J Hyndman and George Athanasopoulos .
The book is in the public domain and is available at the link.
Using the ideas stated above, let us write consider $ARIMA(3,1,2)(0,1,1)_{12}$ model equation in the general form:
$$(1 - \phi_1 B - \phi_2 B^2 - \phi_3 B^3)(1 - B) (1-B^{12})y_t = (1+\theta_1 B + \theta_2 B^2)(1 + \Theta B^{12}) \epsilon_t$$Let us open the brackets :
$$ (1 - B - B \phi_1 + B^2 \phi_1 - B^2 \phi_2 + B^3 \phi_2 - B^3 \phi_3 + B^4 \phi_3 - B^{12} + B^{13} + B^{13} \phi_1 - B^{14}\phi_1 + B^{14}\phi_2 - B^{15} \phi_2 + B^{15}\phi_3 - B^{16}\phi_3)y_t = (1 + B \theta_1 + B^2 \theta_2 + B^{12} \Theta + B^{13}\theta_1 \Theta + B^{14} \theta_2 \Theta) \epsilon_t$$Note that $B$ is lag operator, hence $B^n y_t = y_{y-n}$. We use this idea when we expand the brackets:
$$ y_t - y_{t-1} - \phi_1 y_{t-1} + \phi_1 y_{t-2} - \phi_2 y_{t-2} + \phi_2 y_{t - 3}- \phi_3 y_{t-3}+ \phi_3 y_{t-4} - y_{t-12} + y_{t-13} + \phi_1 y_{t-13} -\phi_1 y_{t-14}+ \phi_2 y_{t-14} - \phi_2 y_{t-15} + \phi_3 y_{t-15} - \phi_3 y_{t-16} = \epsilon_{t} + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \Theta \epsilon_{t-12} + \theta_1 \Theta \epsilon_{t-13} + \theta_2 \Theta \epsilon_{t-14}$$We have obtained the equation for $ARIMA(3,1,2)(0,1,1)_{12}$ in general form. Now, let us substitute the coefficients obtained by AutoARIMA:
Sometimes, the orders of ARIMA can be defined using ACF and PCF plots. “The ACF stands for Autocorrelation function and the PACF for Partial Autocorrelation function. Looking at these two plots together can help us form an idea of what models to fit.” However, today we focus mostly on sktime, so we do not use them in this tutorial.
Let assume that we work for a medicine factory that tries predicting the demand for a10 products for the future to produce an optimal amount. We create a plan for each month, and the senior management should approve it. In this context, we have a train sample representing the target values that were observed before. As a test sample, we have new 24 months. It makes sense when we are not able to adjust the production during the test period, so we do not use a sliding window approach.
We are going to make prediction about value for the last 24 month of observations. Firstly, we will fit the simplest model without seasonality and then will check how seasonality improves the fit.
We will use ARIMA function from sktime.forecasting.arima module. Also, we show the prediction of NaiveForecaster and we will compare their performance using mean_absolute_percentage_error from sklearn.metrics. The explicit formula for MAPE is
$$ \frac{1}{n}\sum_{i=1}^n \frac{ |y_i - \hat y_i|}{|y_i|} $$
Read this to learn more about the metrics for time series.
We create train and test samples. Train represents the values that we observed; we will fit the model using these observations. Test sample represents the observation that our model has not seen yet and going to predict.
y_train, y_test = temporal_train_test_split(series, test_size=24)
An important step in predicting with sktime is the creation of ForecastingHorizon object based on y_test.
fh = ForecastingHorizon(y_test.index, is_relative=False)
Let us show our split on the plot. An exciting remark is that the data's greatest fluctuations are in the right part, i.e. in the test sample. It will be interesting to see how the models will cope with it.
plot_series(y_train, y_test, labels=['Train', 'Test']);
Now we are ready to fit the models and predict values for each one. The first model is NaiveForecaster with a seasonal periodicity equal to 12. We use last strategy because it is the default one. As an alternative, the user can use mean or drift. For more information about strategies, check the documentation.
.fit() is used to train the model using observations from the train sample.
model = NaiveForecaster(strategy="last", sp=12).fit(y_train)
In order to get the forecast we use .predict() method. We put ForecastingHorizon fh, as an argument.
y_pred = model.predict(fh)
The output is also a Series object.
print(type(y_pred))
print(y_pred)
We can visually see how well the model performed on this data. To do so, let us plot train, test and predicted data.
plot_series(y_train, y_test, y_pred, labels=['Train', 'Test', 'Predicted']);
We may just look only at the period when we needed to forecast.
plot_series(y_test, y_pred, labels=['Test', 'Predicted']);
As the chart above illustrates, the Naïve Model with seasonality demonstrates a more-or-less good fit repeating some patterns from the initial data. However, the model underestimates the data at all the points.
We calculate MAE in order to compare the metrics later and say which model performed the best on the data.
mape_naive = mean_absolute_percentage_error(y_pred, y_test)
mape_naive
The second and third models in out list ARIMA and ARIMA of the same order but with seasonality.
We use ARIMA function from sktime.forecasting.arima. We specify the order of the model using tuple.
Recall, that for $ARIMA(p, d, q)$:
We again use .fit() method. For more details, consult the documentation.
model = ARIMA(order = (p, d, q)).fit(y_train)
sktime model has predict() method that allows to forecast values base on the forecasting horizon. ARIMA has an optional parameter return_pred_int that returns prediction interval.
y_pred, y_conf = model.predict(fh, return_pred_int=True)
As before, we can print the summary from model
model.summary()
It is always useful to analyses the performance of time series models using graphs. We have shown you how to plot series using plot_series. Now, as promised, we will demonstrate how to do the same with matplotlib. This method implies writing more code, however, you can configure it more flexibly.
Let us plot the initial data and predicted data together. The shaded are represents the confidence intervals.
y_train.plot(label='Train')
y_test.plot(label='Test')
y_pred.plot(label='Predicted')
plt.fill_between(y_conf.index, y_conf.lower, y_conf.upper, color= 'magenta', alpha=.3)
plt.scatter(y_train.index, y_train)
plt.scatter(y_test.index, y_test)
plt.scatter(y_pred.index, y_pred)
plt.legend()
plt.show()
Take a closer look.
y_test.plot(label='Test')
y_pred.plot(label='Predicted')
plt.fill_between(y_conf.index, y_conf.lower, y_conf.upper, color= 'magenta', alpha= .3)
plt.scatter(y_test.index, y_test)
plt.scatter(y_pred.index, y_pred)
plt.legend()
plt.show()
As we can observe, ARIMA model without seasonality performed quite inferiorly. It seems to increase as the initial series, however, it replicates none of the fluctuations in the initial data.
Calculate MAE for future comparison.
mape_arima = mean_absolute_percentage_error(y_pred, y_test)
mape_arima
Finally, we reached the model that is expected to show the best performance on the data.
Note that even though the function ARIMA is the same as before, we specify the parameters seasonal_order to make it seasonal.
model = ARIMA(order = (p, d, q), seasonal_order = (P, D, Q, S)).fit(y_train)
Let us obtain predicted values and confidence interval putting fh as main argument to predict() method and setting return_pred_int as True.
y_pred, y_conf = model.predict(fh, return_pred_int=True)
model.summary()
It is always useful to analyse the performance of time series models using graphs. We have shown you how to plot series using plot_series. Now, as promised, we will demonstrate how to do the same with matplotlib.
y_train.plot(label='Train')
y_test.plot(label='Test')
y_pred.plot(label='Predicted')
plt.fill_between(y_conf.index, y_conf.lower, y_conf.upper, color= 'magenta', alpha=.3)
plt.scatter(y_train.index, y_train)
plt.scatter(y_test.index, y_test)
plt.scatter(y_pred.index, y_pred)
plt.legend()
plt.show()
At first glance, this model performed magnificently. Let us take a closer look.
y_test.plot(label='Test')
y_pred.plot(label='Predicted')
plt.fill_between(y_conf.index, y_conf.lower, y_conf.upper, color= 'magenta', alpha= .3)
plt.scatter(y_test.index, y_test)
plt.scatter(y_pred.index, y_pred)
plt.legend()
plt.show()
Indeed, taking into account seasonality improves the model prediction significantly. Apart from replicating the main trends and fluctuations of the initial data, this model predicts the peak point rather accurately.
mape_arima_seas = mean_absolute_percentage_error(y_pred, y_test)
mape_arima_seas
Let us compare the metrics for all these three models and say which is the best. Expectedly, the last model did the best, and its MAE is the lowest. However, it was anticipated that Arima without seasonality showed a better score than NaiveForecaster. Visually the latter showed a better fit.
print("Naive Model \t \t", mape_naive)
print("Arima no Seas.\t \t", mape_arima)
print("Arima with Seas.\t", mape_arima_seas)
As we can see, both from the plot and the metric, the Arima model with seasonality demonstrated the best fit. It is important to note that the Naïve forecaster model with seasonality was slightly worse than the Arima model with seasonality visibly.
Remark. We could notice that the amplitude increases each year. One interesting approach would be the logarithm transformation to the initial series. In this way, the data will still increase, however, the amplitude of fluctuations won’t.
series_log = series.apply(np.log)
plot_series(series_log);
Now let us assume that we can change the production plan of A10 medicine each month based on sales during the previous n month. We will perform a prediction using the sliding window approach.
For example, train sample on each iteration looks likes this:

"Sliding window is the way to restructure a time series dataset as a supervised learning problem. Multivariate and multi-step forecasting time series can also be framed as supervised learning using the sliding window method."
Let us implement it in python. We use SlidingWindowSplitter from sktime.
We will start with forecasting with ARIMA model without seasonality. For easier usage of the above mentioned SlidingWindowSplitter, we will be using ForecastingGridSearchCV:
fh = ForecastingHorizon(series[-24:].index, is_relative=False)
y_predA, y_confA = ForecastingGridSearchCV(ARIMA(), SlidingWindowSplitter(window_length=48, start_with_window=True, initial_window=48),
{'order': [(p, d, q)]}, n_jobs=-1).fit(series).predict(fh, return_pred_int=True)
Now, let us do the same, but with seasonality introduced in our model:
y_predAS, y_confAS = ForecastingGridSearchCV(ARIMA(), SlidingWindowSplitter(window_length=48, start_with_window=True, initial_window=48),
{'order': [(p, d, q)], 'seasonal_order': [(P, D, Q, S)]}, n_jobs=-1).fit(series).predict(fh, return_pred_int=True)
And for the naive forcaster for good measure:
y_predN = ForecastingGridSearchCV(NaiveForecaster(), SlidingWindowSplitter(window_length=48, start_with_window=True, initial_window=48),
{'strategy': ["last"], 'sp': [12]}, n_jobs=-1).fit(series).predict(fh)
We can now look at the generated errors. As you can see, the model with seasonality has given us the best performance, while the naive approach, quite predictibly, gave us the worst result.
errors_arima = (mean_absolute_percentage_error(series[-24:], y_predA))
errors_arima_seas = (mean_absolute_percentage_error(series[-24:], y_predAS))
errors_naive = (mean_absolute_percentage_error(series[-24:], y_predN))
print("Naive Model \t \t", np.mean(errors_naive))
print("Arima no Seas.\t \t", np.mean(errors_arima))
print("Arima with Seas.\t", np.mean(errors_arima_seas))
We may see that the prediction with sliding window expectedly showed a better fit than prediction using only one train and one test samples. However, as before, let us the model performance on the graphs.
for y_pred, y_conf, title in [(y_predA, y_confA, 'Arima without seasionality'),
(y_predAS, y_confAS, 'Arima with seasionality'), (y_predN, False, 'Naive seasonal')]:
plt.figure(figsize=(16,4))
plt.title(title)
series[:-24].plot(label='Train')
series[-24:].plot(label='Test')
y_pred.plot(label='Predicted')
if type(y_conf) != bool:
plt.fill_between(y_conf.index, y_conf.lower, y_conf.upper, color= 'magenta', alpha=.3)
plt.scatter(series[:-24].index, series[:-24])
plt.scatter(series[-24:].index, series[-24:])
plt.scatter(y_pred.index, y_pred)
plt.legend()
plt.show()
print("-"*150)
plt.figure(figsize=(16,4))
series[-24:].plot(label='Test')
y_pred.plot(label='Predicted')
if type(y_conf) != bool:
plt.fill_between(y_conf.index, y_conf.lower, y_conf.upper, color= 'magenta', alpha=.3)
plt.scatter(series[-24:].index, series[-24:])
plt.scatter(y_pred.index, y_pred)
plt.legend()
plt.show()
As the plots above show, all of the models performed quite decently.
So, as a result of our work, we have looked at various methods of forecasting data, which has seasonality in place. We have skimmed through ARIMA models with seasonality and without one. The conclusion is evident: The model, which had the seasonality set correctly, has given us the supreme performance in comparison with the more trivial ones, further proving the point that it is extremely important to overview the data before the analysis. P.S. It was really exciting and inspiring to work on this tutorial. We’ve learned a lot and are looking forward to continue working on such tasks.